This web report includes descriptive statistics of the Seattle 911 CAD data. The report starts with an overall summary of the structure of the dataset and then steps through each variable in the dataset.

Dataset Description

Let’s start by identifying the dimensions in the dataset.

## [1] 752421     17

There are 752,421 events and and 17 variables in the data. The variable names from the CAD export are listed below.

##  [1] "CAD_Event_ID"                        "Dispatch_ID"                        
##  [3] "Event_First_Dispatch_Time_ATTR"      "Call_Priority_Code"                 
##  [5] "Call_Type_Desc"                      "Case_Type_Final_Desc"               
##  [7] "Case_Type_Initial_Desc"              "Clear_By_Desc"                      
##  [9] "Dispatch_Address"                    "Officer_Serial_Num"                 
## [11] "Precinct"                            "Sector"                             
## [13] "Squad_Desc"                          "Dispatch_Blurred_Latitude"          
## [15] "Dispatch_Blurred_Longitude"          "CAD_Event_Response_Time_Seconds_SUM"
## [17] "Total_Service_Time_Seconds_SUM"

Now, let’s find the number of categories in the categorical variables. In subsequent sections, I will step through each variable and summarize the distributions in greater detail.

##            Dispatch_ID     Call_Priority_Code         Call_Type_Desc 
##                 719509                      9                      8 
##   Case_Type_Final_Desc Case_Type_Initial_Desc          Clear_By_Desc 
##                    343                    235                     23 
##               Precinct                 Sector 
##                      6                     17

Dispatch ID - This is some sort of identifier. It’s interesting that the identifiers are not unique to each event. What does the dispatch ID identify? Is this identifier going to be relevant for our analysis?

Call priority codes and Call type description have a manageable number of categories - 9 and 8, respectively. After taking a deeper dive into the univariate statistics in the sections below and understanding what these categories mean, we can decide whether any of these categories should be aggregated.

Case Type Final and Case Type Initial Descriptions - These two variables have the greatest number of categories with 343 and 235 categories, respectively. We will want to parse out the categories and see how to regroup into a smaller, more manageable set of categories for analysis. After looking over the categories we can figure out some strategies for aggregating categories.

Clear by description - There are 23 categories in this variable. After further review below, we can look to see if any aggregation is necessary.

Precinct is a categorical spatial indicator. It looks like the city is divided into 6 regional precincts.

Sector - There are 17 sectors. This variable appears to be another spatial category related to precinct. This will be described in the sector section below.

Before diving into the distributions of the categorical variables in greater detail, let’s take advantage of the fact that the data are time-stamped and get a sense of the frequency of events throughout the year.

Event Dates & Times

The data are time stamped to the minute. In the graph below, I have displayed the frequency of events per day. Hover your mouse over the line graph to see the number of events that occurred on a given day.

The date with the highest number of events recorded was 2,530, which was on July 13th. In general, the summer months appear to have higher frequencies that the rest of the year.

November 14th, 2019 is the date with the most marked decrease in events. There were only 76 events recorded on November 14th. This is far below other days with fewer events than normal, as shown in Table 1 below. It raises the possibility of a glitch in the reporting system for that day.

Table 1: Dates with Highest and Fewest Events
Date # Rank
2019-07-13 2,530 1
2019-05-30 2,512 2
2019-05-31 2,509 3
2019-06-14 2,482 4
2019-06-01 2,470 5
2019-03-15 2,469 6
2019-05-10 2,468 7
2019-12-13 2,464 8
2019-04-26 2,457 9
2019-05-02 2,430 10
2019-12-24 1,571 356
2019-03-10 1,566 357
2019-02-09 1,561 358
2019-11-17 1,560 359
2019-11-28 1,501 360
2019-02-10 1,467 361
2019-12-25 1,445 362
2019-02-03 1,434 363
2019-11-13 640 364
2019-11-14 76 365

On average, there were 2,061 events per day in 2019. With the exception of the 76 event day on November 14th, there is not much skewedness in the distribution of events throughout the year.

Table 2: Events Over Time Summary
Daily Avg Std. Dev Median
2,061.427 238.363 2,076

Call Priority Codes

Code 2 is the most common priority code recorded with a total of 219,406 in 2019. According to Table 3, Code 2 is about 30% of the events in 2019. Just over three-quarters of the events are categorized as being categorized as priority codes 1 through 3.

Codes 6 and -1 have the fewest events. They do not show up as clearly in the graph, but in Table 3, shown below, they total to 44 and 274 events, respectively. I dug into the -1 code a little bit more and it looks like this code is applied to very specific cases. All 274 of these events had the call type listed as Onview and had the initial case type description of “DOWN - CHECK FOR DOWN PERSON”. You can flip through the paged table below to see the events with -1 priority codes.

One other point to note is that there is not a code 8; the codes skip from 7 to 9.

Table 3: Call Priority Codes
Code # Events %
-1 274 0.04
1 162,377 21.58
2 219,406 29.16
3 191,722 25.48
4 15,085 2.00
5 6,793 0.90
6 44 0.01
7 131,411 17.47
9 25,309 3.36
Table of -1 Priority Code Events

Call Type Description

Table 4: Call Type Description
Type # Events %
911 325,008 43.19
ONVIEW 245,444 32.62
TELEPHONE OTHER, NOT 911 155,803 20.71
ALARM CALL (NOT POLICE ALARM) 25,606 3.40
TEXT MESSAGE 429 0.06
PROACTIVE (OFFICER INITIATED) 54 0.01
SCHEDULED EVENT (RECURRING) 66 0.01
IN PERSON COMPLAINT 11 0.00

911 calls are about 43% of the events. Onview and some other telephone call are the second and third most common types, and together they comprise just over 50% of the event types. Text message, officer initiated, scheduled recurring event, and in-person complaints are very minimal sources of the events.

Questions 1) Is the plan to focus solely on 911 call types? (If so, the below questions are not relevant.) 2) What is the difference between Onview and Proactive(Officer Initiated)? Is it possible that these two types would be worth combining?

Case Type Final Description

Flip through the pages in the table to view the number of events with each type of case final description. Recall that this variable has 343 different descriptions.

Some of these descriptions have a general description followed by a more specific description that follows a dash. We could parse on the general description and then aggregate to get a smaller set of categories. I demonstrate this in the table below.

This aggregation strategy reduced the number of categories to 153. Traffic related cases are the most common followed by disturbance and suspicious circumstances. If you flip through the pages, there are some categories that also appear to be similar to these top 3. For instance, traffic stop is listed on page 6, which seems like it could also fit under traffic. Also on page 6 is the category suspicious stop, which seems related to suspicious circumstances. All of descriptions and frequencies for the final case type descriptions are listed in the exported Excel file.

Other Comments * Need to make sure to catch abbreviations using reg. expressions (e.g., burg –> burglary) * Similarly, use reg. expressions for categories that look alike but differ in terms of spacing (e.g., Arson, Bombs, Explo; Abandoned car & Abandoned vehicle) * “#NAME?” looks like it might be the classification for events that were not classified. There are 19,900 events with this classification, which is about 2.64 events.

Case Type Initial Descriptions

The top three/four initial case type descriptions occur at about the same frequency. The top four are also in the top four in the final description, but the ordering differs.

One note on structure of these descriptions is that not as many of these descriptions have the same structure as noted in the final descriptions, that is a general description followed by a more specific description/detail, with the two descriptions separated by a dash “-”. Below, I have parsed out the description as I did with the final case descriptions, however, it may be a less useful approach for this description.

Other Comments/Questions * Need to make sure to catch abbreviations using reg. expressions (e.g., HAZ –> HAZARD) * “#NAME?” shows up again in this set of descriptions, though not as frequently as it did in the final descriptions (n=12,132). * Would it be useful to compare final and initial descriptions? We could use some fuzzy matching and regular expressions if this is something important. If final descriptions are missing (meaning that they are coded as #NAME?) and initial descriptions are not missing, should the initial description be applied?

Aggregating reduced the number of descriptions down to 125. The top four descriptions remain the same, but the rest of the top 10 have shifted ranks (e.g., assault, trespass).

Clear by Description

Table 4: Clear By Descriptions
Description # Events %
ASSISTANCE RENDERED 286,250 38.04
REPORT WRITTEN (NO ARREST) 169,102 22.47
PHYSICAL ARREST MADE 63,198 8.40
UNABLE TO LOCATE INCIDENT OR COMPLAINANT 57,973 7.70
CITATION ISSUED (CRIMINAL OR NON-CRIMINAL) 34,242 4.55
NO POLICE ACTION POSSIBLE OR NECESSARY 23,461 3.12
ORAL WARNING GIVEN 23,282 3.09
FALSE COMPLAINT/UNFOUNDED 18,688 2.48
PROBLEM SOLVING PROJECT 17,929 2.38
OTHER REPORT MADE 17,389 2.31
RESPONDING UNIT(S) CANCELLED BY RADIO 11,537 1.53
FOLLOW-UP REPORT MADE 9,320 1.24
STREET CHECK WRITTEN 7,023 0.93
DUPLICATED OR CANCELLED BY RADIO 5,925 0.79
- 2,626 0.35
INCIDENT LOCATED, PUBLIC ORDER RESTORED 2,406 0.32
RADIO BROADCAST AND CLEAR 653 0.09
TRANSPORTATION OR ESCORT PROVIDED 535 0.07
SERVICE OF DVPA ORDER 554 0.07
NON-CRIMINAL REFERRAL 216 0.03
(NOT CURRENTLY USED) ALARM NO RESPONSE 39 0.01
EXTRA UNIT 54 0.01
NO SUCH ADDRESS OR LOCATION 19 0.00
  • It looks like a dash “-” represents missing clear by description (n=2,626).
  • There are some descriptions that I do not know what they mean or how they differ from other descriptions. For instance, how are responding units canceled by radio and duplicated or canceled by radio different?
  • Unable to locate incident or complainant is about 7.7% of the events.

Precinct & Sector

Table 5: Events by Precinct
Precinct # Events %
WEST 219,114 29.12
NORTH 190,225 25.28
SOUTH 133,006 17.68
EAST 120,004 15.95
SOUTHWEST 84,140 11.18
UNKNOWN 5,932 0.79

The western precinct had the most events in 2019, with about 29% of the events occurring in the precinct. North was the next common, comprising about 25% of the events. Southwest had the fewest number of events recorded with 84,140 events last year.

For 5,932 of the events, the precinct is unknown. We may be able to identify a precinct for these events if they have valid latitude and longitude coordinates. Let’s look to see if they do have lat and long:

Table 6: Unknown Precinct Coordinate Status
Coordinate Status # Events
Not valid coords 4,006
Valid coords 1,926

The majority of the events with unknown precincts do not have coordinates that are within the extent of Seattle/King County, Washington. We can however use 1,962 of these events with unknown precincts as they do have coordinates that fall within the geographic extent of Seattle. When I create a spatial object from the coordinates, as shown a few sections below, I will be able to plot these. For some it may be obvious what the precinct is based on the precinct labels given to neighboring events. If the precinct classification is not obvious, the best thing to do would be to obtain a shapefile of the polygons for each of the five precincts, overlay it on the events and give the point the name of the polygon precinct that it falls within. Seattle’s Open Data website has such a shapefile that I will call on and use in the spatial geoprocessing section below.

There are some interesting bivariate analyses that could be explored. For example, call priority codes and precincts. View the interactive stacked bar chart below.

A few things stand out in the stacked bar graph of call priority codes and precincts. * Just under half of the events with the specialized code -1 were in the Western precinct. * The North and West precincts had very similar shares of events in codes 1 through 3. In each of these codes, the cases in the North and West total just over 50% of the cases with that code. * Just over 40% of the events classified as code 6 are in the Northern precinct. * About 45% of code 9 events are in the Western precinct, which is similar to the share of code -1 events.

Let’s turn to focus on the sectors. There are 17 distinct sector names. 5,932 events that were not given a sector. These events are identical to those missing a precinct classification.

Table 7: Events by Precinct-Sector
Precinct Sector # Events Percent
SOUTH OCEAN 53,050 39.89
SOUTH ROBERT 42,067 31.63
SOUTH SAM 37,889 28.49
EAST EDWARD 57,875 48.23
EAST GEORGE 31,724 26.44
EAST CHARLIE 30,405 25.34
SOUTHWEST FRANK 43,161 51.30
SOUTHWEST WILLIAM 40,979 48.70
WEST KING 76,274 34.81
WEST MARY 57,260 26.13
WEST DAVID 47,126 21.51
WEST QUEEN 38,454 17.55
NORTH BOY 45,660 24.00
NORTH NORA 42,120 22.14
NORTH UNION 40,548 21.32
NORTH LINCOLN 33,979 17.86
NORTH JOHN 27,918 14.68
UNKNOWN NA 5,932 100.00

Sectors are unique to precincts. We can think of a sectors as a subdivision of the precinct. In the table above, we see that within the South precinct, Ocean sector had about 40% of the events, whereas the other two sectors - Robert and Sam - were about 30% each.

The Edward sector had nearly half of the events in the East precinct.

The Southwest precinct’s events were relatively evenly divided among the two sectors - Frank and William.

The West precinct, which has the most events of all the precincts, has a wide spread in terms of the number of events in each of its four sectors. The King sector had the most events (76,274 about 35%), and Queen sector had the fewest (38,454 about 18%).

The North precinct has 5 sectors. Boy, Nora, and Union sectors have similar shares of events within their boundaries. The other two sectors - Lincoln and John - make-up just over 30% of the events in the precinct.

The Seattle Open Data website does not appear to have a boundary shapefile or API for sector. This may be something to inquire about if we want to do point-in-polygon analyses.

Squad Description

This is one of the variables with an unmanageable amount of categories. There are only 2,746 events missing a squad description. If you flip through the pages of the table you can see that the squad groups are named in various ways. Some are based on the field/area they work in (e.g., forensics, Arson/Bomb) and others are based on locations (i.e., precinct + sector). If this is a variable that is considered important we would need to approach the aggregation like we would for the Case type descriptions using the first descriptor before the dash, regular expressions, and lazy matching to get broad categories and abbreviations, misspellings, and differences in ordering of words.

Officer Identifier

## [1] 1342

There are 1,342 officers in this dataset.

Response Time

##     Min.  1st Qu.   Median     Mean  3rd Qu.     Max. 
##        0        0      259     1483      989 14773646

Response time for each event is reported in seconds. The summary statistics suggest that there are some very long response times that are outliers. The longest response time is 14,773,646 seconds, which would be many, many days long. Let’s parse the seconds into higher levels of time.

With the times parsed into periods and sorted from longest to shortest time, we can see that the longest time was 170 days and the case was a test call. This is probably a candidate for excluding. For completeness, below the data displayed sorted from shortest to longest, so that it is easier to see what the short response times are.

Total Service Time

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max.    NA's 
##   -2953     504    1326    3072    3540  251587     281

The distribution for total service time on events is strange. There are 281 events missing a total service time. Additionally, there is at least one event that had a negative total service time recorded. First, let’s see how many negative values we have.

Table 7: Total Service Time
Date Service Time (Seconds) response time parsed Case Type Final
2019-11-03 -2,953 0S TRAFFIC - MOVING VIOLATION
2019-11-03 -2,604 3H 41M 19S TRAFFIC - PARKING VIOL (EXCEPT ABANDONED CAR)
2019-11-03 -1,829 1H 22M 0S CRISIS COMPLAINT - GENERAL
2019-11-03 -1,829 1H 22M 0S CRISIS COMPLAINT - GENERAL
2019-11-03 -1,091 9M 1S DISTURBANCE - OTHER
2019-11-03 -998 4M 50S ASSAULTS, OTHER
2019-11-03 -699 2M 9S DISTURBANCE - OTHER
2019-11-03 -699 2M 9S DISTURBANCE - OTHER

There are only 8 events in the dataset with negative values. When we include information like the event date, parsed response time, and case description type, we notice that two of these are duplicates. The other thing that stands out is that these events were all recorded on the same date, November 3rd. It is possible that the negative values were a recording error that occurred that day. We could also check for the average service time on other events of a similar type to see if the absolute value of total service time is reasonable.

Now, let’s look at the NA values.

The events with missing values vary on case types. There appears to be some duplicates, e.g., the assault-DV case on January 13th. Again, it seems like event date and response time would be useful for identifying duplicates and then de-duplicating this dataset.

For the sake of consistency, I parsed the total service time into time periods as I did with the response time. See some of the output below.

With the parsed by period version of service time, we see that the upper end of the service time distribution is 2 days. Notice that the six longest entries are missing case descriptions. There is also a duplicate among these six - the one on March 23rd. If you flip through the pages, you can spot more duplicates, which again suggests that de-duplicating on event date, time, and case type description would be most useful.

Spatial Object

Before transforming the dataframe into a spatial object, the events with missing or invalid coordinates need to be removed. After filtering those events out, the transformed spatial object contains 675,664 locations. In total there are 76,757 events that do not have valid coordinates. Mapping all of these events as points results in over-plotting as shown below.

There are other approaches for visualizations that would be more informative. One approach is to create a point density map to show where the highest and lowest number of events per area occurred in the city. Another approach is to aggregate the points to meaningful geographic units like zipcodes or neighborhoods. The following sections demonstrate these approaches.

Point Density Mapping

This interactive map clusters the points that are proximate. Zoom into different parts of the city to where clusters tend to occur.

Smoothed Point Density Map

The interactive map below shows the areas with high density of calls. Only areas with statistically significant densities are mapped. Highest density areas are in yellow and lowest are red.

Precinct Aggregation & Mapping

Now, let’s turn to by visualizing the frequency of events in different geographic regions of Seattle. In one of the prior sections, I showed the frequency of events per precinct. However, approximately 5,000 of the calls did not have a precinct listed. Now that the dataframe has been transformed to a spatial object, I can identify a precinct for those locations based on which precinct each coordinate pair lies within.

Events per precinct, spatial overlay version
Precinct Events
WEST 192,022
NORTH 175,172
EAST 117,517
SOUTH 116,718
SOUTHWEST 72,039
NA 2,196

A couple of things standout from using the spatial overlay approach to assign precincts. First, the number of points that are not assigned to a precinct decreased from approximately 5,000 to 2,196. The reason these points are not assigned is because they lie outside of the precinct boundaries.

Notice that the number of events per precinct are lower than they were in the section above where I used the precinct given in the data. This indicates that the over 70,000 events that were dropped due to invalid geometries had been given precinct identifiers, but not latitude and longitudes. If we want to preserve all of these points, the best thing to do would be to keep the precincts that were provided in the original dataset and then merge in the spatial overlay precincts for the subset of events that did not have valid coordinates. Let’s do that and then visualize the precincts and frequencies now.

The map shows not only the events per precinct, but also those events that are outside of the precinct boundaries.

Zipcode Aggregation & Mapping

Another aggregation we can perform and visualize is at the zipcode level. Zipcode boundaries were pulled from Seattle’s Open Data website.

The table below lists the count of events per zipcode. There is a sizable range in events per zipcode from 3 to just under 70,000. The map below shows the counts per zipcode.

Events per zipcode
Zipcode Events
98104 69,976.0
98101 62,916.0
98122 53,863.0
98118 42,124.0
98103 35,657.0
98134 35,492.0
98144 33,627.0
98108 31,219.0
98109 29,184.0
98105 28,177.0
98125 27,136.0
98133 27,056.0
98121 23,777.0
98107 23,474.0
98106 22,049.0
98126 17,845.0
98115 17,316.0
98116 16,995.0
98102 16,262.0
98119 13,861.0
98112 13,444.0
98117 12,867.0
98136 6,588.0
98199 5,732.0
98178 3,316.0
98177 2,172.0
98146 1,377.0
98195 1,004.0
98155 641.0
98168 372.0
98188 35.0
98166 19.0
98026 6.0
98057 3.0
98148 3.0

Zipcodes in the core of the city tend to have the highest counts. The zipcodes just below the center of the city have higher counts of events than the zipcodes to the north of the city.

Neighborhood Aggregation & Mapping

The Seattle Open Data website also makes neighborhood boundaries available. In the table below, the events were aggregated to the neighborhoods. This should tell us a little more than the zipcodes do about where events are most prevalent. The neighborhoods and their counts are also featured in the map below. We see that the neighborhoods in the city’s core like the CBD, Broadway, Pioneer Square, and Industrial District had the highest number of events.

Events per neighborhood
Nhood Events
Central Business District 45,749.0
Broadway 40,664.0
Pioneer Square 38,494.0
Industrial District 37,033.0
Industrial District 37,033.0
Belltown 32,937.0
University District 22,443.0
First Hill 18,220.0
International District 16,048.0
Greenwood 14,888.0
Lower Queen Anne 12,940.0
South Lake Union 12,940.0
North Beacon Hill 12,861.0
Adams 12,580.0
Columbia City 11,960.0
Georgetown 11,578.0
Haller Lake 11,571.0
Atlantic 11,352.0
Dunlap 10,912.0
Minor 10,617.0
North College Park 10,539.0
Fremont 10,421.0
Yesler Terrace 9,340.0
Pike-Market 9,120.0
West Woodland 8,539.0
Stevens 8,225.0
Wallingford 7,760.0
South Delridge 7,557.0
Pinehurst 7,375.0
South Park 7,299.0
Bitter Lake 7,257.0
Mount Baker 7,215.0
Brighton 7,032.0
Mid-Beacon Hill 6,718.0
Roxhill 6,656.0
High Point 6,476.0
Genesee 6,175.0
Olympic Hills 6,044.0
North Admiral 5,873.0
Cedar Park 5,774.0
Green Lake 5,570.0
Highland Park 5,423.0
Interbay 4,942.0
Alki 4,925.0
Maple Leaf 4,915.0
Roosevelt 4,878.0
South Beacon Hill 4,782.0
East Queen Anne 4,392.0
North Delridge 4,359.0
Ravenna 4,301.0
Holly Park 4,278.0
Fairmount Park 4,194.0
North Queen Anne 3,697.0
Mann 3,667.0
Phinney Ridge 3,404.0
Rainier Beach 3,181.0
Victory Heights 3,095.0
West Queen Anne 3,087.0
Seward Park 2,966.0
Riverview 2,845.0
Leschi 2,713.0
Loyal Heights 2,506.0
Whittier Heights 2,432.0
Broadview 2,391.0
Crown Hill 2,380.0
Lawton Park 2,353.0
Eastlake 2,281.0
Sunset Hill 2,217.0
Montlake 2,194.0
Fauntleroy 2,186.0
Westlake 2,163.0
Seaview 2,021.0
Gatewood 1,979.0
Wedgwood 1,902.0
Madrona 1,808.0
Rainier View 1,746.0
Southeast Magnolia 1,598.0
Matthews Beach 1,586.0
Meadowbrook 1,583.0
Arbor Heights 1,580.0
Bryant 1,459.0
Madison Park 1,375.0
Sand Point 1,335.0
Laurelhurst 1,167.0
Harrison/Denny-Blaine 1,101.0
North Beach/Blue Ridge 983.0
Briarcliff 866.0
View Ridge 711.0
Windermere 690.0
Harbor Island 656.0
Portage Bay 518.0